Added new problem TD(0) policy evaluation update #517

836hardik-agrawal · 2025-07-29T11:41:12Z

Added a new problem implementation for TD(0) Policy Evaluation. This function performs a single pass of value updates over an episode of (state, action, reward, nextstate) transitions that follow a deterministic policy π. Includes:

1.Core implementation of TD(0) update rule.

2.Markdown version of algorithm.

3.Structured test cases with reasoning and expected outputs.

…roblem

moe18

had a small comment on the TD imp

questions/172_td0_value_function_update_for_single_episode/solution.py

836hardik-agrawal · 2025-08-12T06:20:50Z

@moe18
Thanks for reviewing.
Actually in the question description I have mentioned for taking discounting factor to be 1 so that's why not included in the solution.
If you want I can remove that constraint from question description and add it in solution .

moe18 · 2025-08-15T15:22:21Z

that makes sense so will push the TD Q

moe18

had a few small comments, sorry for the late response back

moe18 · 2025-08-15T15:40:16Z

questions/172_td0_value_function_update_for_single_episode/solution.py

@@ -0,0 +1,3 @@
+def td0_policy_evaluation(episode, V, pi, alpha):
+    for (s, a, r, s_next) in episode:
+        V[s] += alpha * (r + V[s_next] - V[s])


you need to return the V value

moe18 · 2025-08-15T15:41:37Z

questions/172_td0_value_function_update_for_single_episode/tests.json

+  },
+  {
+    "test": "episode = [\n    ('A', 'left', 5.0, 'B'),\n    ('B', 'right', 0.0, 'C'),\n    ('C', 'down', 1.0, 'terminal')\n]\nV = {'A': 0.0, 'B': 0.0, 'C': 0.0, 'terminal': 0.0}\npi = {'A': 'left', 'B': 'right', 'C': 'down'}\nalpha = 0.5\nV_updated = td0_policy_evaluation(episode, V, pi, alpha)\nprint({k: round(v, 2) for k, v in V_updated.items()})",
+    "expected_output": "{'A': 2.5, 'B': 0.5, 'C': 0.5, 'terminal': 0.0}"


the solution you provided gave this output
{'A': 2.5, 'B': 0.0, 'C': 0.5, 'terminal': 0.0}

moe18 · 2025-08-15T15:41:52Z

questions/172_td0_value_function_update_for_single_episode/learn.md

@@ -0,0 +1,23 @@
+
+# Learn Section


no need to say learn section

moe18

some small changes to the SARSA question, but looks good

moe18 · 2025-08-15T21:31:09Z

questions/173_implement_the_SARSA_Algorithm_on_policy/meta.json

@@ -0,0 +1,15 @@
+{
+  "id": "173",
+  "title": "implement_the_SARSA_Algorithm_on_policy",


no need for '_' should just be Implement the SARSA Algorithm on policy

moe18 · 2025-08-15T21:33:50Z

questions/173_implement_the_SARSA_Algorithm_on_policy/tests.json

+  },
+  {
+    "test": "transitions = {\n    ('A', 'x'): (0.0, 'terminal'),\n    ('A', 'y'): (5.0, 'B'),\n    ('B', 'z'): (2.0, 'terminal')\n}\ninitial_states = ['A']\nalpha = 0.4\ngamma = 0.9\nmax_steps = 3\nQ = sarsa_update(transitions, initial_states, alpha, gamma, max_steps)\nfor k in sorted(Q):\n    print(f\"Q{str(k):15} = {Q[k]:.4f}\")",
+    "expected_output": "Q('A', 'x')      = 0.0000\nQ('A', 'y')      = 0.0000\nQ('B', 'z')      = 0.0000"


got this output from the solution Q('A', 'x') = 0.0000 Q('A', 'y') = 0.0000

836hardik-agrawal · 2025-08-16T04:52:16Z

@moe18
Thanks for reviewing will update on it.

836hardik-agrawal added 3 commits July 29, 2025 17:03

added new problem tdo_value_function_update_for_single_episode

258c81b

Merge branch 'main' of https://github.com/836hardik-agrawal/DML-OpenP…

847db56

…roblem

added new problem implement_the_sarsa_algorithm_on_policy

bd85a5a

moe18 reviewed Aug 12, 2025

View reviewed changes

questions/172_td0_value_function_update_for_single_episode/solution.py Outdated Show resolved Hide resolved

moe18 reviewed Aug 15, 2025

View reviewed changes

Added the PR required changes

d01572a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new problem TD(0) policy evaluation update #517

Added new problem TD(0) policy evaluation update #517

Uh oh!

836hardik-agrawal commented Jul 29, 2025

Uh oh!

moe18 left a comment

Uh oh!

Uh oh!

836hardik-agrawal commented Aug 12, 2025

Uh oh!

moe18 commented Aug 15, 2025

Uh oh!

moe18 left a comment

Uh oh!

moe18 Aug 15, 2025

Uh oh!

moe18 Aug 15, 2025

Uh oh!

moe18 Aug 15, 2025

Uh oh!

moe18 left a comment

Uh oh!

moe18 Aug 15, 2025

Uh oh!

moe18 Aug 15, 2025

Uh oh!

836hardik-agrawal commented Aug 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added new problem TD(0) policy evaluation update #517

Are you sure you want to change the base?

Added new problem TD(0) policy evaluation update #517

Uh oh!

Conversation

836hardik-agrawal commented Jul 29, 2025

Uh oh!

moe18 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

836hardik-agrawal commented Aug 12, 2025

Uh oh!

moe18 commented Aug 15, 2025

Uh oh!

moe18 left a comment

Choose a reason for hiding this comment

Uh oh!

moe18 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

moe18 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

moe18 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

moe18 left a comment

Choose a reason for hiding this comment

Uh oh!

moe18 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

moe18 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

836hardik-agrawal commented Aug 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants